A transcriptomic signature for prostate cancer relapse prediction identified from the differentially expressed genes between TP53 mutant and wild-type tumors

For prostate cancer (PCa) patients, biochemical recurrence (BCR) is the first sign of disease relapse and the subsequent metastasis. TP53 mutations are relatively prevalent in advanced PCa forms. We aimed to utilize this knowledge to identify robust transcriptomic signatures for BCR prediction in patients with Gleason score ≥ 7 cancers, which cause most PCa deaths. Using the TCGA-PRAD dataset and the novel data-driven stochastic approach proposed in this study, we identified a 25-gene signature from the genes whose expression in tumors was associated with TP53 mutation statuses. The predictive strength of the signature was assessed by AUC and Fisher’s exact test p-value according to the output of support vector machine-based cross validation. For the TCGA-PRAD dataset, the AUC and p-value were 0.837 and 5 × 10–13, respectively. For five external datasets, the AUCs and p-values ranged from 0.632 to 0.794 and 6 × 10–2 to 5 × 10–5, respectively. The signature also performed well in predicting relapse-free survival (RFS). The signature-based transcriptomic risk scores (TRS) explained 28.2% of variation in RFS on average. The combination of TRS and clinicopathologic prognostic factors explained 23–72% of variation in RFS, with a median of 54.5%. Our method and findings are useful for developing new prognostic tools in PCa and other cancers.


Supplementary Text 1: Description of external transcriptomic signature
Wu's signature 1 : The 10-genes signature was identified to BCR in patients with GS≥7 cancer. The authors used the dataset of 414 TCGA-PRAD samples as the discovery dataset. They firstly selected ~1300 differentially expressed genes (DEGs) between GS=6 and GS≥7 samples using the R package "edgeR". Then, from the DEGs, they identified 39 prognostics genes using the LASSO (least absolute shrinkage and selection operator), Cox-PH regression and 10-fold validation. Finally, the set of prognostics genes was further refined by a multivariate Cox analysis, resulting in the reported signature.
Li's signature 2  pairs. Finally, 74 most significant prognostic gene paired were pinpointed using a forward selection procedure.

Komisarof's signature 3 :
The signature was identified from cooperation response genes (CRGs), which were synergistically dysregulated in response to cooperating oncogenic mutations 4 . The discovery cohort contained 32 PCa samples, whose gene expression intensity was quantified via Taq-Man Low Density Array RT-PCR. The authors scanned the 95 CRGs to estimate the t-test p-values for the differences between the patients who experienced BCR and those without BCR. The combined performance of top significant genes in predicting BCR was evaluated using three classification algorithms. When the cutoff of p-value for defining top significant genes was set at 0.01, the selected signature of 4 top genes showed the best performance.
Erho's signature 5 : The signature consisted of 22 genome fragments (features), located on the coding or noncoding regions of 19 genes. It was identified to predict early metastasis following RP. The discovery cohorts contained 359 samples. Each feature corresponded to a probe set on Human Exon 1.0 ST GeneChips. The authors firstly selected 18,902 differentially expressed features between cases (with metastasis) and controls (without metastasis) using ttest. Then, the initially selected feature set was reduced to a smaller one of 43 features using the regularized logistic regression method. Finally, the 43 features were further filtered to only those that improved a random forest-based performance metric. 6,7 : The 17-gene signature was identified to predict clinical recurrence, prostate cancer death, and adverse pathology. The discovery cohort consisted of 441 patients. Initially, 732 candidate genes were selected through a meta-analysis of several public microarray data sets (GSE3933, GSE10645, GSE5132 and GSE3325), in which gene expression levels were measured using multiple platforms, including Affymetrix Human Genome U133 Plus 2.0 Array and others. The list of the candidate genes was refined by comprehensive bioinformatics approaches using the data of the discovery cohort, in which the gene expression levels of the prostatectomy samples were measured by TaqMan quantitative reverse transcription-polymerase chain reaction assays. LOOCV. Together with the actual BCR labels, the output BCR labels and TRSs were used to calculate a 2×2 contingency table for estimating the p-value and to generate the ROC curve, respectively. Sn and Sp denote sensitivity and specificity, respectively.

Knezevic-Klein's signature
8 Supplementary Fig. S6: The performance of the immune-related prognostic transcriptomic signature for BCR prediction in the TCGA (-PRAD) dataset and five external datasets, i.e. the GSE54460 and others. The "-linear", "polynomial" and "-radial" indicate the kernel functions used in the SVM models. The output BCR label (1 or -1) and numeric decision values, i.e. transcriptomic risk score (TRS) of a patient in the GSE70769 were predicted by the model trained using the GSE70768 dataset. For the patients in other cohorts, the labels and scores are predicted via LOOCV. Together with the actual BCR labels, the output BCR labels and TRSs were used to calculate a 2×2 contingency table for estimating the p-value and to generate the ROC curve, respectively. Sn and Sp denote sensitivity and specificity, respectively.